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Uncovering the hidden regularities and organizational principles of networks arising in 
physical systems ranging from the molecular level to the scale of large communication 
infrastructures is the key issue for the understanding of their fabric and dynamical prop- 
ertiesii^i^^. The "rich-club" phenomenon refers to the tendency of nodes with high cen- 
trality, the dominant elements of the system, to form tightly interconnected communities 
and it is one of the crucial properties accounting for the formation of dominant communi- 
ties in both computer and social sciences Here we provide the analytical expression 
and the correct null models which allow for a quantitative discussion of the rich-club phe- 
nomenon. The presented analysis enables the measurement of the rich-club ordering and 
its relation with the function and dynamics of networks in examples drawn from the bio- 
logical, social and technological domains. 

Recently, the informatics revolution has made possible the analysis of a wide range of large 
scale, rapidly evolving networks such as transportation, technological, social and biological 
network s ^i^i^i'^i^ . While these networks are extremely different from each other in their func- 
tion and attributes, the analysis of their fabric provided evidence of several shared regularities, 
suggesting general and common self-organizing principles beyond the specific details of the 
individual systems. In this context, the statistical physics approach has been exploited as a very 
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convenient strategy because of its deep connection with statistical graph theory and because of 
its power to quantitatively characterize macroscopic phenomena in terms of the microscopic dy- 
namics of the various systemsi^SiiiS. As an initial discriminant of structural ordering, attention 
has been focused on the networks' degree distribution; i.e., the probability P{k) that any given 
node in the network shares an edge with k neighboring nodes. This function is, however, only 
one of the many statistics characterizing the structural and hierarchical ordering of a network; 
a full account of the connectivity pattern calls for the detailed study of the multi-point degree 
correlation functions and/or opportune combination of these. 

In this paper, we tackle a main structural property of complex networks, the so-called "rich- 
club" phenomenon. This property has been discussed in several instances in both social and 
computer sciences and refers to the tendency of high degree nodes, the hubs of the network, to 
be very well connected to each other. Essentially, nodes with a large number of links - usually 
referred to as rich nodes - are much more likely to form tight and well interconnected subgraphs 
(clubs) than low degree nodes. A first quantitative definition of the rich-club phenomenon is 
given by the rich-club coefficient 0, introduced by Zhou and Mondragon in the context of the 
Internet^. Denoting by E^k the number of edges among the N^k nodes having degree higher 
than a given value k, the rich-club coefficient is expressed as: 

'^'^ - nJn::- 1) ■ 

where A^>A,.(A^>fc — l)/2 represents the maximum possible number of edges among the N^k 
nodes. Therefore, measures the fraction of edges actually connecting those nodes out of 
the maximum number of edges they might possibly share. The rich club coefficient is a novel 
probe for the topological correlations in a complex network, and it yields important information 
about its underlying architecture. Structural properties, in turn, have immediate consequences 
on network's features and tasks, such as e.g. robustness, performance of biological functions, or 
selection of traffic backbones, depending on the system at hand. In a social context, for example, 
a strong rich-club phenomenon indicates the dominance of an "oligarchy" of highly connected 
and mutually communicating individuals, as opposed to a structure comprised of many loosely 
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connected and relatively independent sub-communities. In the Internet, such a feature would 
point to an architecture in which important hubs are much more densely interconnected than 
peripheral nodes in order to provide the transit backbone of the network'. It is also worth 
stressing that the rich club phenomenon is not trivially related to the mixing properties of net- 
works, which enable the distinction between assortative networks, where large degree nodes 
preferentially attach to large degree nodes, and disassortative networks, showing the opposite 
tendenc y ^'^I'^i^^ . Indeed, the rich club phenomenon is not necessarily associated to assortative 
mixing. In the top panel of Fig. 1, we sketch a simple construction in which a disassortative 
network is exhibiting the rich club phenomenon. In other words, the rich club phenomenon and 
the mixing properties express different features that are not trivially related or derived one from 
each other (the technical discussion of this point is reported in the methods section). 

In Fig. 1, we report the behavior of the rich club coefficient as a function of the degree in 
a variety of real world networks drawn from the biological, social and technological world. In 
Table 1, we summarize the basic topological features of these networks and the datasets used. 
We also consider three standard network models: the Erdos-Renyi (ER) graph the generalized 
random network having a heavy-tailed degree distribution obtained with the MoUoy-Reed (MR) 
algorithm^^, and the Barabasi-Albert (BA) model In the ER graph, N nodes are connected by 
E edges randomly chosen with probability j9 out of the A^(A^ — l)/2 possible pairs of nodes. The 
MR network is obtained starting from a given degree sequence P{k) (in our case P{k) ~ k~'^ 
with 7 = 3) by randomly connecting nodes with the constraints of avoiding self-loops and 
multiple edges. The BA model is generated by using the growing algorithm of Ref.^"^ that 
produces a scale-free graph with power-law degree sequence with exponent 7 = 3. In all cases, 
the generated networks have = 10^ vertices and an average degree (k) = 6. 

As is evident from Fig. 1, the monotonic increasing of (f)(k) is a feature shared by all the 
analyzed datasets. This behavior is claimed to provide evidence of the rich-club phenomenon 
since 0(fc) progressively increases in vertices with increasing degree {e.g., see Ref.' for the In- 
ternet case, where a different representation of the function is adopted with </> defined in terms 
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of the rank r of nodes sorted by decreasing degree values). However, a monotonic increase 
of (f){k) does not necessarily implies the presence of the rich-club phenomenon. Indeed, even 
in the case of the ER graph - a completely random network - we find an increasing rich-club 
coefficient. This implies that the increase of (f){k) is a natural consequence of the fact that ver- 
tices with large degree have a larger probability of sharing edges than low degree vertices. This 
feature is therefore imposed by construction and does not represent a signature of any particular 
organizing principle or structure, as is clear in the ER case. The simple inspection of the (f)(k) 
trend is therefore potentially misleading in the discrimination of the rich-club phenomenon. 

In order to find opportune baselines for the detection of the rich-club phenomenon we focus 
on the theoretical analysis of (f){k). In the methods section we derive an expression for the rich 
club coefficient as a function of the convolution of the two vertices degree correlation function 
P{k, k'). Interestingly, it is possible to obtain an explicit expression for the rich-club coefficient 
of random uncorrected networks. In this case, the two-vertices correlation function is a simple 
function of the degree distribution, yielding the following behavior for uncorrelated large size 
networks at large degrees: 

(t)unc{k) ~ -jj^ , (2) 

where k^ax is the maximum degree present in the network. Eq.© shows unequivocally that the 
rich-club coefficient is also a monotonically increasing function for uncorrelated networks, so 
that, in order to assess the presence of rich-club structural ordering, it is necessary to compare 
it with the one obtained from the appropriate null model with the same degree distribution, thus 
providing a suitable normalization of 

From the previous discussion, a possible choice for the normalization of the rich-club coeffi- 
cient is provided by the ratio Punc{k) = (f){k) /(pundk), where (pundk) is analytically calculated 
by inserting in Eq. Q, reported in the methods section, the network's degree distribution P{k). 
A ratio larger than one is the actual evidence for the presence of a rich-club phenomenon lead- 
ing to an increase in the interconnectivity of large degree nodes in a more pronounced way than 
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in the random case. On the contrary, a ratio Punc{k) < 1 is a signature of an opposite organizing 
principle that leads to a lack of interconnectivity among large degree nodes. On the other hand, 
a completely degree-degree uncorrected network with finite size is not always realizable due to 
structural constraints. Indeed, any finite size random network presents a structural cut-off value 
kg over which the requirement of the lack of dangling edges introduces the presence of multi- 
ple and self-connections and/or degree-degree correlation s Networks with bounded degree 
distributions and finite second moment (A;^) present a kmax that is below the structural one kg. 
In this situation, (pundk) is properly defined for all degrees and is representative of the network 
topology. However, in networks with heavy-tailed degree distribution (e.g., scale-free degree 
distributions with 2 < 7 < 3, as observed in many real systems), this is no longer the case and 
kg is generally smaller than kmax- In fact, structural degree-degree correlations and higher order 
effects, such as the emergence of large cliques^, set in even in completely random networks. 
The normalization of (f){k) that takes into account these effects is provided by the expression 
Pranik) = (f){k) / (pranik) , whcrc (j)ran{k) is the rich-club coefficient of the maximally random 
network with the same degree distribution P{k) of the network under study. Operatively, the 
maximally random network can be thought of as the stationary ensemble of networks visited by 
a process that, at any time step, randomly selects a couple of links of the original network and 
exchange two of their ending points (automatically preserving the degree distribution). Also in 
this case an actual rich-club ordering is denoted by a ratio Pran{k) > 1. Therefore, whereas 
punc{k) provides information about the overall rich-club ordering in the network with respect to 
an ideally uncorrected graph, pran{k) is a normalized measure which discounts the structural 
correlations due to unavoidable finite size effects, providing a better discrimination of the actual 
presence of the rich club-phenomenon due to the ordering principles shaping the network. 

In Fig. 2, we report the ratios Pran{k) for the real world and the simulated networks. The 
analysis clearly discriminates between networks with or without rich-club ordering. In partic- 
ular, we identify a strong rich-club ordering in the Scientific Collaboration Network, providing 
support to the idea that the elite formed by more influential scientists tends to form collaborative 
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groups within specific domains. This also supports the view that the rich-club phenomenon is 
a natural tendency in many social networks. We find a clearly opposite result in the decreasing 
behavior of the rich club spectrum for the Protein Interaction Network and the Internet map 
at the Autonomous System level. In both cases, this evidence provides interesting information 
regarding the system structure and function. 

The lack of rich-club ordering in the Protein Interaction Network indicates that proteins 
with large number of interactions are presiding over different functions and thus, in general, 
are coordinating specific functional modules (whose detailed analysis requires specific tools^). 
Figure 3 shows portions of the Protein Interaction Network and the Scientific Collaboration 
Network including the club of N^k nodes - N^k = 29 and A^>fc = 35 for the Protein Intera- 
tions, N^k = 30 and A^>fe = 36 for the Scientific Collaboration - and the connections among 
them. The network representations clearly show the presence of a rich-club phenomenon in the 
Scientific Collaboration Network, where the majority of rich nodes are highly interconnected 
forming tight subgraphs, in contrast with the Protein Interaction Network case, where only few 
links appear to connect rich nodes, the rest linking to lower degree vertices. 

In the case of the Internet, the appropriate analysis of the rich-club phenomenon shows 
that, contrary to previous claims', the structure at the Autonomous System level lacks rich-club 
ordering. This might appear counter-intuitive. It is reasonable to imagine the Internet backbone 
made of interconnected transit providers which are also local hubs. This is however not the 
case and an explanation can be easily found in the fact that we are just considering topological 
properties. Indeed, the backbone hubs are identified more in terms of their bandwidth and 
traffic capacity than in terms of the sole number of connections. The present result suggests 
that high degree hubs provide connectivity to local region of the Internet and are not tightly 
interconnected. The backbone of interconnected transit providers is instead identified by high 
traffic links which play a crucial role in terms of traffic capacities but whose number might 
represent a small fraction of the total possible number of interconnections. 

The previous discussion points out that, in some cases, the concept of rich-club ordering 
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should be generalized in order to evaluate the richness of vertices not just in terms of their de- 
gree but in terms of the actual traffic or intensity of interactions handled. In this case, we have to 
consider a weighted network representation of the system where a weight Wij representing the 
traffic or intensity of interaction is associated to each edge between the vertices i and j. Also 
in this case, however, the study of the weighted rich-club coefficient alone does not discrimi- 
nate the actual presence of the rich club effect (see Methods). Given the entanglement of the 
weight and degree correlations, the appropriate null hypothesis is however more complicated to 
define and a detailed account of the evaluation of the weighted rich-club effect will be provided 
elsewhere. 

In summary, the presented analysis provides the baseline functions for the detection of the 
rich-club phenomenon and its effect on the structure of large scale networks. This allows the 
measurement of this effect in a wide range of systems, finally enabling a quantitative discussion 
of various claims such as "high centrality" backbones in technological networks and "elitarian" 
clubs in social systems. 



Methods 

Analytic expression of the rich club coefficient. The basic analytical understanding of the 
rich-club phenomenon starts by considering the quantity Ekk', representing the total number of 
edges between vertices of degree k and of degree k' for k ^ k', and twice the number of edges 
between vertices in the same degree class. We can express the numerator of (j){k) in Eq.[T]as 

2E^k = /fc """"^ dk' J^™""' dk"Ek'k", where k^ax is the maximum degree present in the network 
and where, for the sake of simplicity, the variable k is thought of as continuous. In turn, the 
quantity E^k' can be expressed as a function of the joint degree probability distributio n ' ^i^^i^^i^^ 
via the identity N{k)P{k, k') = E^k', yielding 

^(^^ ^ N{k)J,'-^ dk'J^-- dk"P{k',k") 



N j^"'^^ dk'P{k')\ [AT/^f™- dk'P{k') - 1 
From Eq. Q, it is clear that (j){k) is also a measure of correlations in the network, although 
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it represents a different projection of P{k, k') as compared to other degree-degree correlation 
measures. At the same time, it is possible to see that the rich-club coefficient express a property 
that is not trivially related to the usual indicators of assortative behavior, such as the Pearson's 
correlation coefficient— or the average nearest neighbor degree Notice that these assortativity 
measures quantify two-point correlations and so account for quasi-local properties of the nodes 
in the network, whereas the rich club phenomenon is computed as a global feature within a 
restricted subset. The double integral is indeed a convolution of the correlation function that 
allows the presence of different combinations of the assortative and rich-club features in the 
same network. 

Only in the case of random uncorrected networks^*^, the joint degree distribution P{k, k') 
factorizes and takes the simple form Punc{k, k') = kk' P{k)P(k') / (k)"^. By inserting this ex- 
pression into Eq. ©, we obtain (f)(k) for uncorrected networks as 



'Punci.k) 



N{k) 



/fc*^™^ dk'k'p{k'y^ 

jkr^a. dk'P{k') 



{k)N 



(4) 



where we have applied L'Hopital's rule to derive the behavior for large size networks and large 
degrees. 

Rich club coefficient for weighted networks. If the rich-club is defined as the set of nodes 
having a strength larger than a given value s, a possible definition of the weighted rich-club 
coefficient can be expressed as 



where W^s represents the sum of the weights on the links connecting two nodes in the club and 
the normalization is given by the sum of the strengths of the rich nodes. 
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Table legend 

Table 1: Basic topological properties of the analyzed datasets. We considered four real 
world networks: (1) the Protein Interaction Network^^^ of the yeast Saccharomyces Cere- 
visiae collected with different experimental techniques and documented at the Database of In- 
teracting Proteins ( http://di p.doe-mbi.ucla.edu/| l ; (2) the Scientific Collaboration Network^^^ 
extracted from the electronic database e-Print Archive in the area of condensed matter physics 
([http://xxx.lanl.gov/archive/cond-mat/), from 1995 to 1998, in which nodes represent scientists 
and a connection exists if they coauthored at least one paper in the archive; (3) the network 
of Worldwide Air Transportation ^^^^ representing the International Air Transport Association 
([http://www.iata.org/) database of airport pairs connected by direct flights for the year 2002; 
(4) the Internet network at the Autonomous System- leveli^'iSJ^'i^ from data collected by the 
Oregon Route Views project (|http ://www.routeviews .org/|) in May 2001, in which nodes repre- 



sent Internet service providers and edges connections among those. The sizes of the networks 
in number of nodes and edges are shown, along with the average degree (k) and the maximum 
degree value kmax- We also give the value for the corresponding structural cut-off, ks, in the 
uncorrelated case^. 



Figure Legends 

Figure 1: Schematic picture of the rich-club phenomenon and rich-club spectrum (j){k) 
for real networks. At the top, a conceptual example of disassortative network displaying the 
presence of the rich-club phenomenon is shown. Disassortative mixing is given by the tendency 
of hubs to be on average more likely connected to low degree nodes. However, the four rich 
nodes represented in the schematic picture show a clear rich-club behavior by forming a fully 
connected clique within the club. At the bottom, results for the four real- world networks and 
the three models analyzed are shown. The computer generated networks - ER, MR, and BA - 
have size N = 10^ and average degree {k) =6. ER refers to the Erdds-Renyi graph, MR is 
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constructed from the Molloy-Reed algorithm with a given degree distribution P{k) ~ k~^, and 
the BA model is generated by growing a network with preferential attachment that produces a 
scale-free graph with power-law degree sequence with exponent 7 = 3. Results are averaged 
over n = 10^ different realizations for each model. All networks share a monotonic increasing 
behavior of (f){k), independent of the nature of the degree distribution characterizing the network 
and of the possible presence of underlying structural organization principles. Also random 
networks, either having a Poissonian degree distribution (such as ER) or a heavy-tailed P{k) 
(such as MR and BA), show a rich club spectrum increasing with increasing values of the degree. 
This common trend is indeed due to an intrinsic feature of every network structure, for which 
hubs have simply a larger probability of being more interconnected than low degree nodes. 

Figure 2: Assessment for the presence of the rich-club phenomenon in the networks under 
study. (f){k) is compared to the null hypothesis provided by the maximally random network with 
(t>ran{k). The ratio pran — (l>/4>ran IS plotted as a function of the degree k and compared to the 

baseline value equal to 1. If p{k) > 1 (< 1) the network displays the presence (absence) of 
the rich-club phenomenon with respect to the random case. The Protein Interaction Network, 
the Internet map at the Autonomous System level and the Scientific Collaboration Network 
show clear behaviors as explained in the main text. The Worldwide Air Transportation network 
displays a mild rich-club ordering with pran{k) > 1. The ER and MR network models show 
a ratio pran{k) = 1 VA;, as expected, whereas the BA model exhibits a mixing behavior with 
values above 1 for very high degrees. 

Figure 3: Graph representations of the rich-clubs. Progressively smaller clubs of N^k rich 
nodes in the Protein Interaction Network -top- and in the Scientific Collaboration Network - 
bottom- are shown together with the E^^ connections among them. Here N^k — 35, E^^ — 37 
(top left) and N^^k — 29, E^k — 21 (top right) for the Protein Interactions; A^jfc> = 36, £'>jt = 
62 (bottom left) and A^fc> = 30, E^k — 54 (bottom right) for the collaboration network. The 
two graph representations for each network show progressively smaller clubs made of N^^ 
rich nodes for increasing values of the degree k. The links connecting the rich nodes to the 
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rest of the network are not represented for sake of simplicity. The Protein Interaction Net- 
work shows a club whose hubs are relatively independent being loosely connected among each 
other, leaving the remaining links to coordinate specific functional modules. A different pic- 
ture is observed in the Scientific Collaborations case, where most of the hubs form cliques and 
tightly interconnected subgraphs, thus revealing the tendency of scientists to form densely in- 
terconnected collaborative groups. The graphs have been produced with the Pajek software 
( http://vlado.fmf.uni-lj.si/pub/networks/pajeky| |. 
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